Skip to content

feat: cli OpenAI-compatible API response_format support#884

Open
markstur wants to merge 3 commits intogenerative-computing:mainfrom
markstur:issue_824
Open

feat: cli OpenAI-compatible API response_format support#884
markstur wants to merge 3 commits intogenerative-computing:mainfrom
markstur:issue_824

Conversation

@markstur
Copy link
Copy Markdown
Contributor

@markstur markstur commented Apr 17, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

feat: cli OpenAI-compatible API `response_format` support

   - Added `JsonSchemaFormat` model to represent JSON schema definitions
   - Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`)
   - Used field alias to avoid conflict with Pydantic's `schema` method

   - Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models
   - Updated `_build_model_options()` to exclude `response_format` from model options (handled separately)
   - Modified `make_chat_endpoint()` to:
     - Parse `response_format` from requests
     - Convert `json_schema` type to Pydantic models using the utility function
     - Detect if the serve function accepts a `format` parameter using `inspect.signature()`
     - Pass the generated Pydantic model as `format=` parameter to serve functions that support it
     - Handle backward compatibility with serve functions that don't accept `format`
   - Added proper error handling for invalid schemas

   - Test json_schema format is converted to Pydantic model and passed to serve
   - Test json_object format doesn't pass a schema
   - Test text format doesn't pass a schema
   - Test error handling for missing json_schema field
   - Test error handling for invalid JSON schemas
   - Test backward compatibility with serve functions without format parameter
   - Test optional fields in JSON schemas

When a client sends a request with `response_format.type = "json_schema"`, the server:
1. Extracts the JSON schema from `response_format.json_schema.schema`
2. Dynamically creates a Pydantic model from the schema
3. Passes it as the `format=` parameter to the serve function
4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method

This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

   - Added `JsonSchemaFormat` model to represent JSON schema definitions
   - Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`)
   - Used field alias to avoid conflict with Pydantic's `schema` method

   - Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models
   - Updated `_build_model_options()` to exclude `response_format` from model options (handled separately)
   - Modified `make_chat_endpoint()` to:
     - Parse `response_format` from requests
     - Convert `json_schema` type to Pydantic models using the utility function
     - Detect if the serve function accepts a `format` parameter using `inspect.signature()`
     - Pass the generated Pydantic model as `format=` parameter to serve functions that support it
     - Handle backward compatibility with serve functions that don't accept `format`
   - Added proper error handling for invalid schemas

   - Test json_schema format is converted to Pydantic model and passed to serve
   - Test json_object format doesn't pass a schema
   - Test text format doesn't pass a schema
   - Test error handling for missing json_schema field
   - Test error handling for invalid JSON schemas
   - Test backward compatibility with serve functions without format parameter
   - Test optional fields in JSON schemas

When a client sends a request with `response_format.type = "json_schema"`, the server:
1. Extracts the JSON schema from `response_format.json_schema.schema`
2. Dynamically creates a Pydantic model from the schema
3. Passes it as the `format=` parameter to the serve function
4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method

This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur markstur requested a review from a team as a code owner April 17, 2026 23:37
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@markstur markstur changed the title Issue 824 feat: cli OpenAI-compatible API response_format support Apr 17, 2026
@github-actions github-actions Bot added the enhancement New feature or request label Apr 17, 2026
Comment thread cli/serve/app.py
@@ -186,6 +289,7 @@ async def endpoint(request: ChatCompletionRequest):
created=created_timestamp,
stream_options=request.stream_options,
system_fingerprint=system_fingerprint,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-streaming path (return ChatCompletion just below, line 297) returns output.value without validating against format_model. Suggest adding before that return (also needs import json at the top and ValidationError added to the pydantic import):

if format_model is not None and output.value is not None:
    try:
        format_model.model_validate(json.loads(output.value))
    except (json.JSONDecodeError, ValidationError) as e:
        return create_openai_error_response(
            status_code=400,
            message=f"Output does not match required schema: {e!s}",
            error_type="invalid_response_error",
        )

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe OpenAI responses can return output that is not valid for a given schema if things like token limits are hit. Do we want to match that behavior? Or should we always error on our side if the format isn't met?

Comment thread cli/serve/streaming.py
)
yield f"data: {chunk.model_dump_json()}\n\n"

# Validate format if format_model is provided
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation runs after all content chunks are already sent (lines 68–106), so the error arrives after the client has consumed the data. A few options:

  1. Buffer when format_model is set, validate, then stream or error before emitting anything.
  2. Return a 400 upfront when stream=True + json_schema — simplest for now.
  3. Keep post-hoc but document it — callers can pass format= to the backend for constrained decoding instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to #891 right?

Comment thread cli/serve/app.py
)


def _json_schema_to_pydantic(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this handles type, but will not handle enum, additionalProperties, nested types, array, $ref, allOf, anyOf

Suggest clarifying caveats in comments? or figuring out if any more validation is viable

Comment thread cli/serve/app.py

# Check if serve function accepts format parameter
serve_sig = inspect.signature(module.serve)
accepts_format = "format" in serve_sig.parameters
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheable/could be done up front? Here it's done in every request but won't change?

Comment thread cli/serve/models.py
schema_: dict[str, Any] = Field(alias="schema")
"""JSON Schema definition."""

strict: bool | None = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used? See related comment - more is needed to really be strict or at least clarify behaviour?

Comment thread cli/serve/app.py
accepts_format = "format" in serve_sig.parameters

# Detect if serve is async or sync and handle accordingly
if inspect.iscoroutinefunction(module.serve):
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar (not identical) code is repeated multiple times - possible opportunity for making common - minor.

Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a broader question that is touched on in my comments below: If we trust our backend provider to properly handle our structured output requests, why do we do any validation on our side? (Because module.serve might do something funky?)

Comment thread cli/serve/app.py
Comment on lines +246 to +258
if accepts_format:
output = await module.serve(
input=request.messages,
requirements=request.requirements,
model_options=model_options,
format=format_model,
)
else:
output = await module.serve(
input=request.messages,
requirements=request.requirements,
model_options=model_options,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these calls be combined? If format defaults to None, is the expectation that module.serve handles that differently? Does module.serve default to a different format value?

Comment thread cli/serve/app.py
@@ -186,6 +289,7 @@ async def endpoint(request: ChatCompletionRequest):
created=created_timestamp,
stream_options=request.stream_options,
system_fingerprint=system_fingerprint,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe OpenAI responses can return output that is not valid for a given schema if things like token limits are hit. Do we want to match that behavior? Or should we always error on our side if the format isn't met?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

m serve OpenAI API structured output

3 participants